53 research outputs found

    Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

    Full text link
    Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the perspective of variance transmission by investigating the relationship between BN and Weights Normalization (WN). In this work, we demonstrate that the problem of the shift of the average gradient will amplify the variance of every convolutional (conv) layer. We propose Parametric Weights Standardization (PWS), a fast and robust to mini-batch size module used for conv filters, to solve the shift of the average gradient. PWS can provide the speed-up of BN. Besides, it has less computation and does not change the output of a conv layer. PWS enables the network to converge fast without normalizing the outputs. This result enhances the persuasiveness of the shift of the average gradient and explains why BN works from the perspective of variance transmission. The code and appendix will be made available on https://github.com/lyxzzz/PWSConv.Comment: This paper has been accepted by AAAI2

    Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge

    Full text link
    In formal procedure of civil cases, the textual materials provided by different parties describe the development process of the cases. It is a difficult but necessary task to extract the key information for the cases from these textual materials and to clarify the dispute focus of related parties. Currently, officers read the materials manually and use methods, such as keyword searching and regular matching, to get the target information. These approaches are time-consuming and heavily depending on prior knowledge and carefulness of the officers. To assist the officers to enhance working efficiency and accuracy, we propose an approach to detect disputes from divorce cases based on a two-round-labeling event extracting technique in this paper. We implement the Judicial Intelligent Assistant (JIA) system according to the proposed approach to 1) automatically extract focus events from divorce case materials, 2) align events by identifying co-reference among them, and 3) detect conflicts among events brought by the plaintiff and the defendant. With the JIA system, it is convenient for judges to determine the disputed issues. Experimental results demonstrate that the proposed approach and system can obtain the focus of cases and detect conflicts more effectively and efficiently comparing with existing method.Comment: 20 page

    Impact of a Diagnostic Pressure Equation Constraint on Tornadic Supercell Thunderstorm Forecasts Initialized Using 3DVAR Radar Data Assimilation

    Get PDF
    A diagnostic pressure equation constraint has been incorporated into a storm-scale three-dimensional variational (3DVAR) data assimilation system. This diagnostic pressure equation constraint (DPEC) is aimed to improve dynamic consistency among different model variables so as to produce better data assimilation results and improve the subsequent forecasts. Ge et al. (2012) described the development of DPEC and testing of it with idealized experiments. DPEC was also applied to a real supercell case, but only radial velocity was assimilated. In this paper, DPEC is further applied to two real tornadic supercell thunderstorm cases, where both radial velocity and radar reflectivity data are assimilated. The impact of DPEC on radar data assimilation is examined mainly based on the storm forecasts. It is found that the experiments using DPEC generally predict higher low-level vertical vorticity than the experiments not using DPEC near the time of observed tornadoes. Therefore, it is concluded that the use of DPEC improves the forecast of mesocyclone rotation within supercell thunderstorms. The experiments using different weighting coefficients generate similar results. This suggests that DPEC is not very sensitive to the weighting coefficients

    Security and Energy-aware Collaborative Task Offloading in D2D communication

    Get PDF
    Device-to-device (D2D) communication technique is used to establish direct links among mobile devices (MDs) to reduce communication delay and increase network capacity over the underlying wireless networks. Existing D2D schemes for task offloading focus on system throughput, energy consumption, and delay without considering data security. This paper proposes a Security and Energy-aware Collaborative Task Offloading for D2D communication (Sec2D). Specifically, we first build a novel security model, in terms of the number of CPU cores, CPU frequency, and data size, for measuring the security workload on heterogeneous MDs. Then, we formulate the collaborative task offloading problem that minimizes the time-average delay and energy consumption of MDs while ensuring data security. In order to meet this goal, the Lyapunov optimization framework is applied to implement online decision-making. Two solutions, greedy approach and optimal approach, with different time complexities, are proposed to deal with the generated mixed-integer linear programming (MILP) problem. The theoretical proofs demonstrate that Sec2D follows a [O(1∕V),O(V)] energy-delay tradeoff. Simulation results show that Sec2D can guarantee both data security and system stability in the collaborative D2D communication environment

    Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases

    Full text link
    Large Language Models (LLMs) have demonstrated remarkable performance in code completion. However, due to the lack of domain-specific knowledge, they may not be optimal in completing code that requires intensive domain knowledge for example completing the library names. Although there are several works that have confirmed the effectiveness of fine-tuning techniques to adapt language models for code completion in specific domains. They are limited by the need for constant fine-tuning of the model when the project is in constant iteration. To address this limitation, in this paper, we propose kkNM-LM, a retrieval-augmented language model (R-LM), that integrates domain knowledge into language models without fine-tuning. Different from previous techniques, our approach is able to automatically adapt to different language models and domains. Specifically, it utilizes the in-domain code to build the retrieval-based database decoupled from LM, and then combines it with LM through Bayesian inference to complete the code. The extensive experiments on the completion of intra-project and intra-scenario have confirmed that kkNM-LM brings about appreciable enhancements when compared to CodeGPT and UnixCoder. A deep analysis of our tool including the responding speed, storage usage, specific type code completion, and API invocation completion has confirmed that kkNM-LM provides satisfactory performance, which renders it highly appropriate for domain adaptive code completion. Furthermore, our approach operates without the requirement for direct access to the language model's parameters. As a result, it can seamlessly integrate with black-box code completion models, making it easy to integrate our approach as a plugin to further enhance the performance of these models.Comment: Accepted by ASE202

    Learning the Relation between Similarity Loss and Clustering Loss in Self-Supervised Learning

    Full text link
    Self-supervised learning enables networks to learn discriminative features from massive data itself. Most state-of-the-art methods maximize the similarity between two augmentations of one image based on contrastive learning. By utilizing the consistency of two augmentations, the burden of manual annotations can be freed. Contrastive learning exploits instance-level information to learn robust features. However, the learned information is probably confined to different views of the same instance. In this paper, we attempt to leverage the similarity between two distinct images to boost representation in self-supervised learning. In contrast to instance-level information, the similarity between two distinct images may provide more useful information. Besides, we analyze the relation between similarity loss and feature-level cross-entropy loss. These two losses are essential for most deep learning methods. However, the relation between these two losses is not clear. Similarity loss helps obtain instance-level representation, while feature-level cross-entropy loss helps mine the similarity between two distinct images. We provide theoretical analyses and experiments to show that a suitable combination of these two losses can get state-of-the-art results. Code is available at https://github.com/guijiejie/ICCL.Comment: This paper is accepted by IEEE Transactions on Image Processin

    LawBench: Benchmarking Legal Knowledge of Large Language Models

    Full text link
    Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain
    corecore